System and method for object recognition and classification using a three-dimensional system with adaptive feature detectors

ABSTRACT

A method including imaging an object in three-dimensions; binning data of the imaged object into three-dimensional regions having a predetermined size; determining a density value p of the data in each bin; inputting the p density values of the bins into a first layer of a computational system including a corresponding processing element for each of the bins; calculating an output O of the processing elements of the computational system while restricting the processing elements to have weights Wc 1  connecting the processing elements to the corresponding p density values; and communicating an estimated class of the scanned object based on the calculated system outputs.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to computerized objectrecognition and, more particularly, to object recognition andclassification using a three-dimensional (3D) system.

2. Description of the Related Art

Computerized object recognition is the process of finding or identifyingan object in an image or video. Recognizing an object can include theprocess of classifying objects belonging to distinct classes. Objectclassifying using computer vision can be applied to, among other things,automated production processes, security, and automotive applications.

The majority of object recognition technologies today use camera imagesas the input or another suitable two-dimensional sensor. Each imageserves as an input to an object recognition algorithm, such as a neuralnetwork or another machine learning system. The image is usually fedinto the algorithm as a collection of features, e.g., pixel intensities.The temporal order of such features is meaningless in the context of asingle image. More importantly, the number of features can be verylarge, making the task of object recognition computationally verydemanding. Most object recognition technologies inputting 3-D imagesproject the collected data into a two-dimensional space and then trackfeatures as just described.

Object recognition is known to be especially difficult if the objectposition and orientation is not constrained (i.e., the object may appearfrom an arbitrary viewing angle). In order to recognize and classifyobjects with a high degree of reliability, computer vision systems needto account for this variance. Reliable rotational invariant recognitionof objects has remained an unsolved problem.

SUMMARY OF THE INVENTION

In order to address these problems among others, the present inventionuses data from a 3D imaging device. Unlike traditional two-dimensionalscanners, also known as line scanners, 3D imaging devices can providehighly accurate distance measurements in the spherical coordinatesystem.

The invention further enables the recognition task to be carried outdirectly in the original 3D space, rather than performing projections ofthe original 3D data onto 2D surfaces such as employed by conventionalimaging devices and 2D laser scanners.

Instead of choosing features a priori, 3D features represented ascollections of adjustable parameters (weights) are made adaptive andlearned from data representing the object in three dimensions.

In order to obtain a system with decreased sensitivity to rotationalinvariance (i.e., a “rotation invariant system”), the present inventionuses as an option multiple binned 3D data as inputs for the system. Thebinned data show different viewing angles for the same objects assimultaneously presented to the system for object classification.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete appreciation of the invention and many of the attendantadvantages thereof will be readily obtained as the same becomes betterunderstood by reference to the following detailed description whenconsidered in connection with the accompanying drawings, wherein:

FIG. 1 is a flowchart illustrating steps for calculating an output ofprocessing elements of a computational system according to oneembodiment of the invention;

FIGS. 2A and 2B illustrate a 3D scan of a vehicle and a non-vehicle,respectively;

FIGS. 3A and 3B illustrate a binned representation of the vehicle andthe non-vehicle according to an embodiment of the invention;

FIG. 4 illustrate a multi-layer object classification system accordingto an embodiment of the invention;

FIG. 5 illustrates a non-limiting example of a matrix connection tableused to correlate original indexes of binned representations to indexesof a first layer of the computational system;

FIG. 6 is a flowchart illustrating steps for calculating the processingelements of the computational system;

FIG. 7A is a flowchart illustrating steps for training the systemaccording to an embodiment of the invention; and

FIG. 7B is a flowchart illustrating steps for training the systemaccording to a second embodiment of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring now to the drawings, wherein like reference numerals designateidentical or corresponding parts/steps throughout the several views.FIG. 1 is a flowchart illustrating steps for calculating an output ofprocessing elements of a computational system according to oneembodiment of the invention. In step 101, an object is scanned using a3D scanner. Imaging of an object can be accomplished in three dimensionsusing both scanning and non-scanning systems. The invention is notlimited to a scanning system. However, a system providing highlyaccurate distance measurements is preferred. The 3D laser designemployed by the inventor included many laser sending-receiving headsrotating around a vertical axis. The data from a 3D scanner can berepresented conveniently in three spatial dimensions: X coordinate, Ycoordinate, and Z coordinate of the Cartesian coordinate system.However, the computational system of the invention can be adapted to usea different coordinate system, such as the scanner's original sphericalsystem.

FIGS. 2A and 2B illustrate how two different classes of objects (vehicleand non-vehicle) may look when plotted in X, Y, and Z coordinates(normalized to be approximately zero mean and unity standard deviation).These figures show approximately 100,000 to 200,000 data points for eachobject collected over several seconds of observing the stationaryobjects from a moving platform equipped with the 3D scanner. The step ofconverting the platform-centered coordinates into stationary or movingwith the object of interest X, Y, Z coordinates is understood by thoseof ordinary skill in the art and thus not described here. The objectmotion may reduce the number of points associated with the object, butthe reduced number of points does not change the operation principle ofthis invention.

In step 103, the original 3D laser data points are binned into thepre-specified regions of the 3D space. For example, the regions can bedefined as cubes of the 3D grid spanning the space occupied by theobject. In addition, 3D rectangles may be advantageous to use as binnedregions when the coordinates have different resolution along differentaxis. FIGS. 3A and 3B illustrate the objects of FIGS. 2A and 2B,respectively, after the binning operation has been performed. The sizeof the squares in FIGS. 3A and 3B reflect the relative density of thedata points in particular bins.

In step 105, the binned 3D data is processed by a computational systemwith shared adjustable parameters (weights). See FIG. 4 for anembodiment of the computational system. The local density of points ineach bin is denoted p(ind,i,j,k) where ind is the index of the binned 3Ddata, and i, j, and k are coordinates of the bin (see FIGS. 2 and 3).The index ind may correspond to a different viewing angle of the sameobject, e.g., ind=0 may denote the binned 3D data for the object rotated90 degrees from its default angle of viewing, and ind=1 may denote thebinned 3D data for the object rotated −90 degrees from its default angleof viewing. The invention does not require multiple views (indexes) andcan be implemented using only the default view of the 3D imaging device.However, to obtain a rotation invariant system, the present inventionuses multiple binned 3D data as inputs for the system. The binned datashow different viewing angles for the same objects as simultaneouslypresented to the system for object classification. This multi-viewarrangement is shown in FIG. 4 where the input layer has multiple binned3D views of the object indexed by the index ind.

The density values p are normalized over the entire set of pointsbelonging to the same object (same index ind). Such a normalization canbe done in a variety of ways, e.g., by computing an average density ofpoints per bin and subtracting the average density from each bin'sdensity. As an alternative, scaling by a standard deviation from theaverage density can be employed to normalize the density values p.Logarithmic normalization can also be employed if binned densitiesdiffer by orders of magnitude.

The density values p of the map are fed into a first layer of the systemin step 107. The first layer includes Nc1 three-dimensional maps havinga processing element O for each of the input bins. In each map, everyprocessing element O is restricted to have the same values of weightsWc1 (adjustable parameters) connecting the processing elements O withelements of the inputs p(ind,i,j,k). These weights form the receptivefield or the adaptive feature vector. By restricting the weights to havethe same values among different processing elements O, the inventiondramatically reduces the computational complexity of the overall systembut also creates effective and adaptive feature detectors for learningvarious features of the object regardless of their location in thebinned 3D data.

The number of weights Wc1 depends on the size of the receptive field ofthe processing elements in the first layer. Without loss of generality,the size of the receptive filed can be specified as dx, dy, and dz incoordinates of the binned 3D data. The processing elements O thenimplement generally a nonlinear transformation f (e.g., a hyperbolictangent or a threshold) on a product of the weights Wc1 and the data.The number of weights Wc1 also depends on the connectivity patternbetween the inputs and the first layer.

The manner of calculating the processing elements O is described withreference to FIG. 6. The connectivity between the input elements and theprocessing elements O in the first layer of the computational system canbe tracked using a matrix connection_table. See a non-limiting exampleof a connectivity table in FIG. 5. The connection_table's rows specifythe indexes of the binned 3D data views, whereas its columns correspondto the indexes of the first layer's maps. The number of rows in thematrix connection_table is specified by elements of the vector num. Thevector num stores the numbers of connections for each index IC1. Withreference to FIG. 5 for example, num(0)=num(1)=num(2)=num(3)=num(4)=3,and num(5)=num(6)=7. In step 601, the matrix connection_table isgenerated. In step 603, the sizes dx, dy, and dz of the receptive fieldare specified.

The computations in the first layer of the system are illustrated in thepseudocode form below (C language notation):

for (ic1=0; ic1<Nc1; ic1++) {  for (i=0; i<Nc1x; i++) {   for (j=0;j<Nc1y; j++) {    for (k=0; k<Nc1z; k++) {     sum=0; wind=0;      for(is0=0; is0<num(ic1); is0++) {       ind = connection_table(is0,ic1);       for (i0=i; i0<i+dx; i0++) {         // Nc1*+d* must be equal toN* (where * is x, y or z)          for (j0=j; j0<j+dy; j0++) {          for (k0=k; k0<k+dz; k0++)            sum  = sum  +p(ind,i0,j0,k0)*Wc1(ic1,wind);            wind = wind + 1;            }         }        }       }       sum = sum + Wc1(ic1,wind);  /* addingbias weight */       o(ic1,i,j,k) = f( sum );       }     }   } }

The computations above can be repeated several times, thereby realizinga multi-layer processing system (FIG. 4). The system output estimatesthe class label. The system output is connected to all elements of theprevious layer (see FIG. 4) and implements a nonlinear function, e.g., ahyperbolic tangent or a threshold, on the scalar product between thesystems inputs and weights. See step 605. Without the loss ofgenerality, a two-class classification problem is described, requiringjust one output of the system. However, three or more classes can beestimated. The system can recognize more classes by adding more outputelements, as shown in FIG. 4.

The system can also have sub-sampling layers which implement amathematical operation of spatial sub-sampling. See step 607. Thisamounts to reducing the resolution of the binned 3D data by, forexample, half for each coordinate of the binned space, i.e., averagingthe outputs O(ic1,i,j,k) over two adjacent elements in three dimensions.Mathematically, and as an example, the output of the element (i,j,k) ofthe subsampling layer is(O(ic1,i,j,k)+O(ic1,i,j,k+1)+O(ic1,i,j+1,k)+O(ic1,i,j+1,k+1)+O(ic1,i+1,j,k)+O(ic1,i+1,j,k+1)+O(ic1,i+1,j+1,k)+O(ic1,i+1,j+1,k+1))/8.Similar to the outputs O(ic1,i,j,k), the local densities p(ind,i0,j0,k0)can also be subsampled before feeding them as inputs into the firstlayer maps, if desired.

By way of example, the computational system can be applied to thetwo-class (vehicle and non-vehicle) system illustrated in FIGS. 2A and2B. According to one embodiment of the invention, if the network outputof the two-class system is above zero, then the estimated class label isdeclared to belong to the vehicle class. If the network output is equalor below zero, then the estimated class label is declared to belong tothe non-vehicle class.

FIG. 7A is a flowchart illustrating steps for training the systemaccording to an embodiment of the invention (supervised training). Thefirst step (step 701) is to obtain a representative set of examples foreach class of objects to be trained. For each class, the system can betrained (step 703) using the Gradient Descent Algorithm, for example,until a suitable error value is reached (e.g., the root-mean-squareerror decreases sufficiently well on a validation data set). Othertechniques for training are described in Training RecurrentNeurocontrollers for Robustness With Derivative-Free Kalman Filter byDanil V. Prokhorov which is hereby incorporated by reference.

In step 705, the trained system is tested on a set of data unseen by thesystem during the training process. The system is considered to beacceptable if the value of the test error function is close to that ofthe training error function (Decision 707). Otherwise, more trainingexamples should be added and the training process repeated.

As an alternative training embodiment, the supervised training describedabove can be combined with the unsupervised training procedure describedbelow. The goal of the unsupervised training is to recover the mostuseful adaptive features represented through the sets of weights Wc1 foric1=0, 1, . . . , Nc1 based on the training data. It should be notedthat such training does not utilize the class label information, henceit is called unsupervised training.

FIG. 7B is a flowchart illustrating steps for training the system usingunsupervised training. In step 751, several sets of weights Wc1(num) fornum=1, 2, . . . , Nu are initialized. This process can be completed in avariety of ways, randomly or purposefully. A purposeful process woulduse prior information for assigning initial Wc1(num), e.g., it might beknown that a particular feature exists in the data, hence it can beassigned to one of the sets Wc1. A random process would not use anyprior information, e.g., all weights Wc1 might be set to random valuesin a particular range. The parameter age(num) is set equal to 1 for allnum. In step 753, an element (i, j, k) of the binned data is chosenrandomly. For each num, the output O(num) is computed in step 755. Thecomputation includes (i) computing the scalar product (Wc1(num)*p(ind))between the appropriate elements of the receptive field Wc1(num) and thedata elements p(ind) centered around (i,j,k); (ii) computing theEuclidian norms of Wc1(num) and p(ind); and (iii) computingO(num)=f(scalar_product(Wc1(num)*p(ind))/(norm(Wc1(num))*norm(p(ind)))).In step 757, the maximum output O(num_max) is chosen from the outputs Ogenerated for num=1, 2, . . . , Nu.

In step 759, Wc1(num_max) is adjusted according to the followingequations to implement a form of the well known Hebbian adaptation rule.Other clustering methods can also be used.

-   -   determining mu        -   mu=0, if age(num_max)≦t1;        -   mu=c*(age(num_max)−t1)/(t2−t1) if t1<age(num_max)≦t2;        -   mu=c+(age(num_max)−t2)/r if t2<age(num_max), where c, r, t1            and t2 are design parameters (e.g., c=2, r=2000, t1=20,            t2=200);    -   p1(t)=(age(num_max)−1−mu)/age(num_max),        p2(t)=(1+mu)/age(num_max);    -   Wc1(num_max,index)=p1(t)*Wc1(num_max,index)+p2(t)*p(ind,index)*O(num_max)

The unsupervised training process is repeated starting with step 753until little changes in Wc1 are observed. For example,Norm(Wc1)<epsilon, where epsilon is a small positive number. In thisunsupervised training algorithm, the parameter age is used to select Nc1sets of weights Wc1 with sufficiently large age. This is completed byordering the ages of all Nu processing elements and discarding thoseNu-Nc1 elements whose age is smaller than a threshold. The set ofweights Wc1 remains fixed (constant) in the subsequent supervisedtraining (weights of other layers in the structure are trained in asupervised fashion till the adequate performance is attained).

The computational system of the present invention can be implementedusing a microprocessor or its equivalent. The microprocessor utilizes acomputer readable storage medium, such as a memory (e.g., ROM, EPROM,EEPROM, flash memory, static memory, DRAM, SDRAM, and theirequivalents), configured to control the microprocessor to perform themethods of the present invention. The microprocessor, in an alternateembodiment, further include or exclusively include a logic device foraugmenting or fully implementing the present invention. Such a logicdevice includes, but is not limited to, an application-specificintegrated circuit (ASIC), a field programmable gate array (FPGA), ageneric-array of logic (GAL), and their equivalents. The microprocessorcan be a separate device or a single processing mechanism.

Obviously, numerous modifications and variations of the presentinvention are possible in light of the above teachings. It is thereforeto be understood that within the scope of the appended claims, theinvention may be practiced otherwise than as specifically describedherein.

The invention claimed is:
 1. A method, comprising: imaging an object inthree-dimensions by collecting three-dimensional data points of theobject; binning the data points of the imaged object intothree-dimensional bins having a predetermined three-dimensional size tocreate a binned scanned object in a three-dimensional bin space;determining a density value of the data in each of the bins bycalculating a number of the data points in each of the bins; inputtingthe density value of each of the bins into a first layer of acomputational system including a corresponding processing element foreach of the bins; calculating outputs of the processing elements,wherein all of the processing elements have weights of a same valueconnecting the processing elements to corresponding density values; andcommunicating an estimated class of the scanned object based on thecalculated outputs, wherein the calculating includes: performing anonlinear transformation on a product of a connected weight and thedensity value for each bin; sub-sampling the outputs by averaging theoutputs over two adjacent processing elements in three dimensions; andrepeating the calculating process using additional layers of thecomputational system, the output of the computational system connectedto all processing elements of a last layer of the computational system.2. The method of claim 1, wherein the data of the imaged object inthree-dimensions is represented in a Cartesian coordinate system.
 3. Themethod of claim 1, wherein the data of the imaged object inthree-dimensions is represented in a spherical system.
 4. The method ofclaim 1, wherein the nonlinear transformation includes applying ahyperbolic tangent to the product of the connected weight and thedensity value for each bin.
 5. The method of claim 1, wherein a numberof the weights depends on a connectivity pattern between the inputdensity values and the processing elements in the first layer of thecomputational system.
 6. The method of claim 1, wherein the estimatedclass of the scanned object is selected by the computational system froma group including at least two candidate classes.
 7. The method of claim1, wherein the imaging is accomplished with a three-dimensional scanningsystem.
 8. The method of claim 1, wherein the imaging is accomplishedwith a three-dimensional non-scanning system.
 9. A non-transitorycomputer readable medium having computer code stored thereon, thecomputer code, when executed by a computer, causing the computer toimplement the method according to claim
 1. 10. The method of claim 1,further comprising: rotating the binned scanned object, creating Nadditional views of the binned scanned object, each view including thebins of the data points; and determining weights for each of the N+1views, wherein the computational system includes three-dimensional maps,each map includes processing elements for one of the N+1 views, and eachof the processing elements for one of the maps is connected to thecorresponding density values of the corresponding view via thedetermined weights of the corresponding map.
 11. The method of claim 10,further comprising: normalizing the density values separately for eachof the N+1 views.
 12. A method, comprising: imaging an object inthree-dimensions by collecting three-dimensional data points of theobject; binning the data points of the imaged object intothree-dimensional bins having a predetermined three-dimensional size tocreate a binned scanned object in a three-dimensional bin space;determining a density value of the data in each of the bins bycalculating a number of the data points in each of the bins; inputtingthe density value of each of the bins into a first layer of acomputational system including a corresponding processing element foreach of the bins; and calculating outputs of the processing elements,wherein all of the processing elements have weights of a same valueconnecting the processing elements to corresponding density values,wherein the weights are obtained through unsupervised training of thecomputational system, and the unsupervised training includes:initializing a plurality of weights Wc1(num) for num =1, 2,,... Nu,wherein Nu is a predetermined integer; for each num, computing an outputO for a randomly selected bin of data; determining a num_maxcorresponding to a maximum computed output O; adjusting the weightWc1(num_max) to implement an unsupervised training algorithm; andrepeating the computing, determining, and adjusting steps until changesin the weights Wc1 are smaller than a predetermined value.
 13. Themethod of claim 12, wherein the adjusting includes implementing a formof the Hebbian adaptation rule.
 14. The method of claim 12, wherein thecomputing the output O includes: computing a scalar product for eachdensity value of bins adjacent the randomly selected bin of data,referred to as p(ind), by calculating Wc1(num)*p(ind); and computingEuclidian norms of Wc1(num) and p(ind), wherein the computed output O ofthe randomly selected bin is a function of the computed scalar productsand the computed Euclidian norms.
 15. The method of claim 12, whereinthe calculating includes: performing a nonlinear transformation on aproduct of a connected weight and the density value for each bin;sub-sampling the outputs by averaging the outputs over two adjacentprocessing elements in three dimensions; and repeating the calculatingprocess using additional layers of the computational system, the outputof the computational system connected to all processing elements of alast layer of the computational system.
 16. A non-transitory computerreadable medium having computer code stored thereon, the computer code,when executed by a computer, causing the computer to implement themethod according to claim
 12. 17. A method, comprising: imaging anobject in three-dimensions by collecting three-dimensional data pointsof the object; binning the data points of the imaged object intothree-dimensional bins having a predetermined three-dimensional size tocreate a binned scanned object in a three-dimensional bin space;determining a density value of the data in each of the bins bycalculating a number of the data points in each of the bins; inputtingthe density value of each of the bins into a first layer of acomputational system including a corresponding processing element foreach of the bins; and calculating outputs of the processing elements,wherein all of the processing elements have weights of a same valueconnecting the processing elements to corresponding density values,wherein the weights are obtained through unsupervised training of thecomputational system, and the supervised training includes: obtaining arepresentative set of examples of objects with the classes; training thesystem using a supervised training algorithm until a predetermined errorvalue is reached; testing the trained system on a set of unknown objectsto obtain error value; determining whether the error value is withintolerance of a training error; and repeating the supervised trainingprocess until the error value is within the tolerance of the trainingerror.
 18. The method of claim 17, wherein the supervised trainingalgorithm is a form of an incremental optimization algorithm.
 19. Themethod of claim 17, wherein the calculating includes: performing anonlinear transformation on a product of a connected weight and thedensity value for each bin; sub-sampling the outputs by averaging theoutputs over two adjacent processing elements in three dimensions; andrepeating the calculating process using additional layers of thecomputational system, the output of the computational system connectedto all processing elements of a last layer of the computational system.20. A non-transitory computer readable medium having computer codestored thereon, the computer code, when executed by a computer, causingthe computer to implement the method according to claim 17.